In this investigation, I wanted to look at the relationship between different services of a villa and prices of the villas. To see if the prices are anyway related to the services.
The dataset contains the details of 977 villas with 38 attributes. Most variables are categorical in nature, but the variables area, price, age of property, locality score, project score and builders experience are numerical variables. Some outliers like age of property (with 122 years) and builders experience (greater than 250) were removed.
CPU times: total: 0 ns Wall time: 0 ns
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| area | 919.0 | 2.649709e+03 | 1.566466e+03 | 522.0 | 1.518500e+03 | 2.350000e+03 | 3.300000e+03 | 11500.0 |
| price | 919.0 | 2.492829e+07 | 2.612990e+07 | 1800000.0 | 7.650000e+06 | 1.950000e+07 | 3.150000e+07 | 240000000.0 |
| status | 919.0 | 1.327530e-01 | 3.394923e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| new/resale | 919.0 | 4.646355e-01 | 4.990194e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| price_negotiable | 919.0 | 4.570185e-02 | 2.089514e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| furnished | 919.0 | 1.751904e-01 | 3.803369e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| age of property | 919.0 | 3.165397e+00 | 6.302718e+00 | 0.0 | 0.000000e+00 | 0.000000e+00 | 5.000000e+00 | 122.0 |
| Lift(s) | 919.0 | 3.841132e-01 | 4.866497e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Full Power Backup | 919.0 | 6.332971e-01 | 4.821668e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.0 |
| 24 X 7 Security | 919.0 | 3.090316e-01 | 4.623458e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Children's play area | 919.0 | 7.094668e-01 | 4.542556e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.0 |
| Club House | 919.0 | 5.146899e-01 | 5.000563e-01 | 0.0 | 0.000000e+00 | 1.000000e+00 | 1.000000e+00 | 1.0 |
| Gymnasium | 919.0 | 4.733406e-01 | 4.995606e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Swimming Pool | 919.0 | 4.744287e-01 | 4.996176e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Sports Facility | 919.0 | 4.047878e-01 | 4.911182e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Jogging Track | 919.0 | 2.883569e-01 | 4.532447e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Landscaped Gardens | 919.0 | 1.099021e-01 | 3.129380e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| locality_score | 919.0 | 8.023009e+00 | 9.509524e-01 | 5.3 | 8.000000e+00 | 8.015749e+00 | 8.300000e+00 | 9.7 |
| project_score | 919.0 | 8.036305e+00 | 2.536038e-01 | 6.7 | 8.039148e+00 | 8.040040e+00 | 8.081644e+00 | 8.8 |
| builder_experience | 919.0 | 1.881241e+02 | 3.268600e+02 | 12.0 | 5.200000e+01 | 1.859448e+02 | 1.862927e+02 | 2022.0 |
| Rain Water Harvesting | 919.0 | 2.763874e-01 | 4.474542e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Car Parking | 919.0 | 3.275299e-01 | 4.695679e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Vaastu Compliant | 919.0 | 1.566921e-01 | 3.637081e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Golf Course | 919.0 | 3.590860e-02 | 1.861636e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Intercom | 919.0 | 4.755169e-01 | 4.996721e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 1.000000e+00 | 1.0 |
| Indoor Games | 919.0 | 1.741023e-01 | 3.794039e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Maintenance Staff | 919.0 | 1.055495e-01 | 3.074275e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Multipurpose Room | 919.0 | 1.436344e-01 | 3.509096e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| ATM | 919.0 | 5.223069e-02 | 2.226130e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Cafeteria | 919.0 | 7.399347e-02 | 2.619028e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Staff Quarter | 919.0 | 6.528836e-02 | 2.471685e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Hospital | 919.0 | 2.611534e-02 | 1.595651e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| School | 919.0 | 2.829162e-02 | 1.658950e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
| Shopping Mall | 919.0 | 4.134929e-02 | 1.992052e-01 | 0.0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.0 |
Continuation
Continuation
Skewness is : 1.6798244071303055
Text(0.5, 1.0, 'Analysis of Area - Skweness')
| location | area | price | price_currency | status | new/resale | price_negotiable | description | facing | furnished | ... | Intercom | Indoor Games | Maintenance Staff | Multipurpose Room | ATM | Cafeteria | Staff Quarter | Hospital | School | Shopping Mall | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48 | Magarpatta | 10000 | 130000000 | INR | 1.0 | 0 | 0 | A spacious 8 bhk villa is available for sale i... | unknown | 1 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 322 | Tungarli | 11000 | 60000000 | INR | 0.0 | 0 | 0 | Well designed 6 bhk villa is available at a pr... | northeast | 1 | ... | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 805 | Khandala | 11500 | 200000000 | INR | 0.0 | 0 | 1 | Well designed 7 bhk villa is available at a pr... | east | 0 | ... | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 896 | Valvan Lonavla | 11000 | 70000000 | INR | 0.0 | 0 | 1 | It’s a 6 bhk villa situated in Valvan Lonavla.... | east | 1 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 rows × 38 columns
Skewness is : 3.446796049159059
Text(0.5, 1.0, 'Analysis of Price - Skweness')
Text(0.5, 1.0, 'Analysis of facing')
Text(0.5, 1.0, 'Analysis of age of property')
Skewness is : -1.3706736591442539
Text(0.5, 1.0, 'Analysis of Locality score - Skweness')
Skewness is : -0.7508526031710991
Text(0.5, 1.0, 'Analysis of Project score - Skweness')
Skewness is : 5.172789603406742
Text(0.5, 1.0, 'Analysis of Builder Experience - Skweness')
Text(0.5, 1.0, 'Analysis of new/resale')
Text(0.5, 1.0, 'Analysis of price currency')
Text(0.5, 1.0, 'price negotiable')
Text(0.5, 1.0, 'Analysis of status')
Text(0.5, 1.0, 'Analysis of furnished')
Text(0.5, 1.0, 'Analysis of Gymnasium')
Text(0.5, 1.0, 'Analysis of 24 X 7 Security')
Text(0.5, 1.0, 'Analysis of Swimming Pool')
Text(0.5, 1.0, 'Analysis of Lift(s)')
Text(0.5, 1.0, 'Analysis of Jogging Track')
Text(0.5, 1.0, 'Analysis of Club House')
Text(0.5, 1.0, 'Analysis of Landscaped Gardens')
Text(0.5, 1.0, "Analysis of Children's play area")
Text(0.5, 1.0, 'Analysis of Sports Facility')
Text(0.5, 1.0, 'Analysis of Car Parking')
Text(0.5, 1.0, 'Analysis of Full Power Backup')
Text(0.5, 1.0, 'Analysis of Indoor Games')
Text(0.5, 1.0, 'Analysis of Intercom')
Text(0.5, 1.0, 'Analysis of Maintenance Staff')
Text(0.5, 1.0, 'Analysis of Shopping Mall')
Text(0.5, 1.0, 'Analysis of Cafeteria')
Text(0.5, 1.0, 'Analysis of ATM')
Text(0.5, 1.0, 'Analysis of Multipurpose Room')
Text(0.5, 1.0, 'Analysis of School')
Text(0.5, 1.0, 'Analysis of Hospital')
Text(0.5, 1.0, 'Analysis of Golf Course')
Text(0.5, 1.0, 'Analysis of Staff Quarter')
Bivariate analysis involves the analysis of two variables , for the purpose of determining the emperical relationship between them.
The pairplot shows the relationship between each and every variable with each other
Text(0.5, 1.0, 'Pairplot')
From above pair plot, we observed/deduced below
From above pair plot, we observed/deduced below Continuation
From above pair plot, we observed/deduced below Continuation
Correlation is primarily concerned with finding out whether a relationship exists between variables and then determining the magnitude and action of that relationship.
| area | price | status | new/resale | price_negotiable | furnished | age of property | Lift(s) | Full Power Backup | 24 X 7 Security | ... | Intercom | Indoor Games | Maintenance Staff | Multipurpose Room | ATM | Cafeteria | Staff Quarter | Hospital | School | Shopping Mall | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| area | 1.000000 | 0.793453 | 0.097484 | -0.307463 | 0.102195 | 0.170988 | 0.140596 | 0.205812 | 0.254673 | 0.190184 | ... | 0.315960 | 0.162077 | 0.065576 | 0.102418 | 0.034531 | 0.164943 | 0.065662 | -0.033823 | -0.027773 | -0.026709 |
| price | 0.793453 | 1.000000 | 0.112521 | -0.318050 | 0.076041 | 0.173623 | 0.093551 | 0.127169 | 0.180562 | 0.088652 | ... | 0.204398 | 0.074818 | 0.039846 | 0.009506 | 0.024413 | 0.108878 | 0.028044 | -0.009604 | -0.014931 | -0.011668 |
| status | 0.097484 | 0.112521 | 1.000000 | -0.223027 | 0.114010 | 0.047470 | 0.138892 | 0.014098 | 0.018219 | 0.099229 | ... | -0.000084 | 0.065624 | 0.126530 | 0.040934 | -0.005364 | -0.000333 | -0.012530 | 0.056585 | 0.068632 | 0.063711 |
| new/resale | -0.307463 | -0.318050 | -0.223027 | 1.000000 | 0.025964 | -0.331777 | -0.377042 | -0.085300 | -0.205622 | -0.198094 | ... | -0.266692 | -0.019227 | -0.043098 | -0.070493 | 0.006840 | 0.145067 | -0.069578 | -0.029430 | -0.027377 | -0.051023 |
| price_negotiable | 0.102195 | 0.076041 | 0.114010 | 0.025964 | 1.000000 | -0.032321 | 0.075315 | 0.020003 | -0.049720 | 0.079163 | ... | -0.124906 | 0.036931 | 0.077445 | 0.044085 | -0.004536 | 0.097383 | 0.026531 | 0.029508 | 0.088360 | 0.033062 |
| furnished | 0.170988 | 0.173623 | 0.047470 | -0.331777 | -0.032321 | 1.000000 | 0.175576 | 0.059782 | 0.142794 | 0.051081 | ... | 0.157295 | -0.060622 | 0.102541 | -0.009183 | 0.020468 | -0.119341 | 0.063600 | 0.032227 | 0.024948 | 0.062439 |
| age of property | 0.140596 | 0.093551 | 0.138892 | -0.377042 | 0.075315 | 0.175576 | 1.000000 | 0.165719 | 0.208526 | 0.399250 | ... | 0.252061 | 0.170617 | 0.152893 | 0.136514 | 0.092437 | -0.031179 | 0.157386 | 0.126762 | 0.122623 | 0.155056 |
| Lift(s) | 0.205812 | 0.127169 | 0.014098 | -0.085300 | 0.020003 | 0.059782 | 0.165719 | 1.000000 | 0.452385 | 0.444985 | ... | 0.569569 | 0.286389 | 0.187423 | 0.346355 | 0.277147 | 0.187005 | 0.316545 | 0.179299 | 0.148599 | 0.229271 |
| Full Power Backup | 0.254673 | 0.180562 | 0.018219 | -0.205622 | -0.049720 | 0.142794 | 0.208526 | 0.452385 | 1.000000 | 0.455141 | ... | 0.688382 | 0.313648 | 0.232003 | 0.292325 | 0.168486 | 0.215101 | 0.201109 | 0.110450 | 0.102605 | 0.146695 |
| 24 X 7 Security | 0.190184 | 0.088652 | 0.099229 | -0.198094 | 0.079163 | 0.051081 | 0.399250 | 0.444985 | 0.455141 | 1.000000 | ... | 0.504312 | 0.525083 | 0.352721 | 0.484819 | 0.340442 | 0.197785 | 0.395191 | 0.215331 | 0.226741 | 0.251413 |
| Children's play area | 0.006799 | -0.065953 | -0.116938 | 0.125220 | -0.089490 | -0.039244 | 0.092137 | 0.406819 | 0.477902 | 0.344974 | ... | 0.484545 | 0.287492 | 0.204227 | 0.241577 | 0.139453 | 0.180893 | 0.169126 | 0.089763 | 0.080282 | 0.108827 |
| Club House | 0.241640 | 0.155450 | 0.001334 | -0.151795 | -0.027283 | 0.069504 | 0.188979 | 0.632570 | 0.584850 | 0.503335 | ... | 0.750214 | 0.434354 | 0.298140 | 0.379059 | 0.218169 | 0.274490 | 0.247822 | 0.145360 | 0.152559 | 0.190734 |
| Gymnasium | 0.300888 | 0.150523 | 0.033737 | -0.197145 | -0.061365 | 0.067607 | 0.272298 | 0.568658 | 0.630950 | 0.554502 | ... | 0.794902 | 0.467061 | 0.348164 | 0.407138 | 0.198646 | 0.298173 | 0.252311 | 0.104403 | 0.101121 | 0.164338 |
| Swimming Pool | 0.356388 | 0.209707 | 0.026458 | -0.194784 | -0.020097 | 0.146852 | 0.247649 | 0.575833 | 0.655146 | 0.524687 | ... | 0.770919 | 0.443020 | 0.311915 | 0.375132 | 0.198111 | 0.289198 | 0.242885 | 0.117699 | 0.113880 | 0.174812 |
| Sports Facility | 0.279877 | 0.143833 | -0.048244 | -0.190436 | -0.095548 | 0.109808 | 0.196537 | 0.574783 | 0.590724 | 0.436755 | ... | 0.786183 | 0.375521 | 0.171249 | 0.401804 | 0.185028 | 0.241149 | 0.266637 | 0.115168 | 0.099949 | 0.118227 |
| Jogging Track | 0.198497 | 0.086632 | 0.069523 | -0.072862 | 0.044732 | 0.028907 | 0.278814 | 0.455393 | 0.429551 | 0.769896 | ... | 0.514606 | 0.619927 | 0.430206 | 0.526943 | 0.368789 | 0.361484 | 0.415188 | 0.257253 | 0.253569 | 0.326264 |
| Landscaped Gardens | 0.083397 | 0.013896 | 0.067590 | -0.041353 | 0.106354 | -0.015506 | 0.155358 | 0.158827 | 0.252946 | 0.337203 | ... | 0.243638 | 0.435030 | 0.388822 | 0.441362 | 0.120790 | 0.419022 | 0.005716 | 0.007905 | 0.023974 | 0.119239 |
| locality_score | 0.120238 | 0.173677 | 0.034737 | -0.408020 | -0.044342 | 0.017487 | 0.124899 | 0.034319 | -0.066436 | -0.035903 | ... | -0.064406 | -0.146128 | -0.219317 | -0.199737 | -0.059848 | -0.339653 | 0.063032 | 0.055016 | 0.062266 | 0.001121 |
| project_score | 0.003288 | -0.021737 | -0.095453 | -0.082216 | -0.075545 | -0.030654 | -0.030605 | 0.101267 | 0.057957 | 0.102147 | ... | 0.105365 | 0.226262 | -0.051753 | 0.196414 | 0.171820 | -0.007015 | 0.281906 | 0.063541 | 0.092886 | 0.068727 |
| builder_experience | 0.030644 | 0.007455 | 0.039760 | -0.033992 | -0.045322 | 0.004896 | 0.044081 | -0.041409 | -0.011121 | -0.043546 | ... | -0.000787 | -0.107873 | -0.005541 | -0.150704 | 0.049959 | -0.002207 | -0.087706 | 0.141317 | 0.133889 | 0.077666 |
| Rain Water Harvesting | 0.050226 | -0.022825 | -0.005158 | 0.019429 | 0.074470 | -0.047996 | 0.196216 | 0.542454 | 0.313762 | 0.681917 | ... | 0.405457 | 0.518323 | 0.239076 | 0.510036 | 0.357972 | 0.169229 | 0.427635 | 0.234451 | 0.232068 | 0.274940 |
| Car Parking | 0.172775 | 0.072986 | 0.130115 | -0.157387 | 0.113729 | 0.056528 | 0.425569 | 0.445149 | 0.444455 | 0.777628 | ... | 0.459024 | 0.523367 | 0.462038 | 0.533940 | 0.336374 | 0.334181 | 0.369310 | 0.234642 | 0.230512 | 0.297588 |
| Vaastu Compliant | 0.093096 | 0.007914 | 0.043084 | -0.179501 | 0.077673 | 0.045458 | 0.262397 | 0.398116 | 0.303161 | 0.579773 | ... | 0.356798 | 0.465193 | 0.231875 | 0.480669 | 0.544604 | 0.335581 | 0.564655 | 0.361125 | 0.377796 | 0.466772 |
| Golf Course | 0.138415 | 0.110404 | 0.027907 | 0.054725 | 0.041777 | -0.058175 | 0.007930 | 0.196282 | 0.122585 | 0.237957 | ... | 0.179264 | 0.250690 | 0.009838 | 0.354513 | 0.454114 | 0.459312 | 0.209407 | 0.005068 | 0.002341 | -0.010708 |
| Intercom | 0.315960 | 0.204398 | -0.000084 | -0.266692 | -0.124906 | 0.157295 | 0.252061 | 0.569569 | 0.688382 | 0.504312 | ... | 1.000000 | 0.373019 | 0.261493 | 0.380412 | 0.217165 | 0.221958 | 0.251103 | 0.130992 | 0.113496 | 0.185284 |
| Indoor Games | 0.162077 | 0.074818 | 0.065624 | -0.019227 | 0.036931 | -0.060622 | 0.170617 | 0.286389 | 0.313648 | 0.525083 | ... | 0.373019 | 1.000000 | 0.318582 | 0.491072 | 0.175962 | 0.418346 | 0.308454 | 0.086757 | 0.077420 | 0.135253 |
| Maintenance Staff | 0.065576 | 0.039846 | 0.126530 | -0.043098 | 0.077445 | 0.102541 | 0.152893 | 0.187423 | 0.232003 | 0.352721 | ... | 0.261493 | 0.318582 | 1.000000 | 0.212731 | 0.110363 | 0.322303 | 0.095577 | 0.054779 | 0.090898 | 0.248831 |
| Multipurpose Room | 0.102418 | 0.009506 | 0.040934 | -0.070493 | 0.044085 | -0.009183 | 0.136514 | 0.346355 | 0.292325 | 0.484819 | ... | 0.380412 | 0.491072 | 0.212731 | 1.000000 | 0.336146 | 0.382051 | 0.268544 | 0.205301 | 0.192092 | 0.211028 |
| ATM | 0.034531 | 0.024413 | -0.005364 | 0.006840 | -0.004536 | 0.020468 | 0.092437 | 0.277147 | 0.168486 | 0.340442 | ... | 0.217165 | 0.175962 | 0.110363 | 0.336146 | 1.000000 | 0.400737 | 0.472494 | 0.666895 | 0.638368 | 0.688177 |
| Cafeteria | 0.164943 | 0.108878 | -0.000333 | 0.145067 | 0.097383 | -0.119341 | -0.031179 | 0.187005 | 0.215101 | 0.197785 | ... | 0.221958 | 0.418346 | 0.322303 | 0.382051 | 0.400737 | 1.000000 | 0.295501 | 0.136174 | 0.152340 | 0.191844 |
| Staff Quarter | 0.065662 | 0.028044 | -0.012530 | -0.069578 | 0.026531 | 0.063600 | 0.157386 | 0.316545 | 0.201109 | 0.395191 | ... | 0.251103 | 0.308454 | 0.095577 | 0.268544 | 0.472494 | 0.295501 | 1.000000 | 0.343404 | 0.353398 | 0.387591 |
| Hospital | -0.033823 | -0.009604 | 0.056585 | -0.029430 | 0.029508 | 0.032227 | 0.126762 | 0.179299 | 0.110450 | 0.215331 | ... | 0.130992 | 0.086757 | 0.054779 | 0.205301 | 0.666895 | 0.136174 | 0.343404 | 1.000000 | 0.918543 | 0.754209 |
| School | -0.027773 | -0.014931 | 0.068632 | -0.027377 | 0.088360 | 0.024948 | 0.122623 | 0.148599 | 0.102605 | 0.226741 | ... | 0.113496 | 0.077420 | 0.090898 | 0.192092 | 0.638368 | 0.152340 | 0.353398 | 0.918543 | 1.000000 | 0.755668 |
| Shopping Mall | -0.026709 | -0.011668 | 0.063711 | -0.051023 | 0.033062 | 0.062439 | 0.155056 | 0.229271 | 0.146695 | 0.251413 | ... | 0.185284 | 0.135253 | 0.248831 | 0.211028 | 0.688177 | 0.191844 | 0.387591 | 0.754209 | 0.755668 | 1.000000 |
34 rows × 34 columns
We have linear relationships in below featues as we got to know from above matrix
We can plot heatmap and can easily confirm our above findings
Heatmap is a graphical representation of data where values are depicted by color. Also we are using it to confirm the correlation
Text(0.5, 1.0, 'Heatmap')
Here we looking at the relationship between price and age of property, from the plot there isn't any real relationship between price and the age of property
Text(0.5, 1.0, 'Bivariate Analysis of age of property and price')
Here we looking at the relationship between area and location, from the plot there isn't any clear relation between them
AxesSubplot(0.125,0.125;0.775x0.755)
Here we looking at the relationship between builder experience and price, from the plot there is no relation between them
AxesSubplot(0.125,0.125;0.775x0.755)
Text(0.5, 1.0, 'Bivariate Analysis of builder experience and price')
AxesSubplot(0.125,0.125;0.775x0.755)
Text(0.5, 1.0, 'Bivariate Analysis of builder experience < 250 and price')
Here we looking at the relationship between locality score and price, from the plot there is no relation between them
AxesSubplot(0.125,0.125;0.775x0.755)
Text(0.5, 1.0, 'Bivariate Analysis of locality score and price')
Here we looking at the relationship between project score and price, from the plot there is no relation between them
AxesSubplot(0.125,0.125;0.775x0.755)
Text(0.5, 1.0, 'Bivariate Analysis of project score and price')
Here we looking at the relationship between facing and price, from the plot there is no relation between them.
AxesSubplot(0.125,0.125;0.775x0.755)
Text(0.5, 1.0, 'Bivariate Analysis of facing and price')
Here we looking at the relationship between area and price, from the plot there is a linear relation between them. As the area increases the price increases.
AxesSubplot(0.125,0.125;0.775x0.755)
Text(0.5, 1.0, 'Bivariate Analysis of area and price')
From all the attributes only area has a linear relationship with price, meaning as the area increases the price increases.
Index(['location', 'area', 'price', 'price_currency', 'status', 'new/resale',
'price_negotiable', 'description', 'furnished', 'age of property',
'Lift(s)', 'Full Power Backup', '24 X 7 Security',
'Children's play area', 'Club House', 'Gymnasium', 'Swimming Pool',
'Sports Facility', 'Jogging Track', 'Landscaped Gardens',
'locality_score', 'project_score', 'builder_experience',
'Rain Water Harvesting', 'Car Parking', 'Vaastu Compliant',
'Golf Course', 'Intercom', 'Indoor Games', 'Maintenance Staff',
'Multipurpose Room', 'ATM', 'Cafeteria', 'Staff Quarter', 'Hospital',
'School', 'Shopping Mall', 'east', 'north', 'northeast', 'northwest',
'south', 'southeast', 'southwest', 'unknown', 'west'],
dtype='object')
Checking the correlation once more using heatmap
Using the correlation to select the desired features. Based on the correlation 8 columns were chosen to be the desired features. this features would be used fot the modelling
| price | area | new/resale | Swimming Pool | Intercom | Full Power Backup | locality_score | furnished | Club House | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 6500000 | 1780 | 0 | 0 | 0.0 | 0 | 8.000000 | 0 | 0 |
| 1 | 17500000 | 2225 | 0 | 0 | 0.0 | 0 | 8.015749 | 1 | 0 |
| 2 | 6500000 | 1785 | 0 | 0 | 0.0 | 0 | 8.000000 | 0 | 0 |
| 3 | 15000000 | 1200 | 0 | 0 | 0.0 | 0 | 8.015749 | 1 | 0 |
| 4 | 35000000 | 4000 | 0 | 0 | 0.0 | 0 | 8.015749 | 1 | 0 |
Splitting the data into train and test data The result is are seen below X_train.shape, X_test.shape, y_train.shape, y_test.shape
(735, 8) (184, 8) (735,) (184,)
We are selecting 6 Models to work with, namely; Linear Regression, Ridge Regressor, Lasso Regressor, Elastic Net, SVR Regressor, Random Forest.
Model Name: Linear Regression Model: LinearRegression() time: 0.5953559875488281 ---------------------------------- Model Name: Ridge Regressor Model: Ridge() time: 0.16841793060302734 ---------------------------------- Model Name: Lasso Regressor Model: Lasso() time: 0.252666711807251 ---------------------------------- Model Name: Elastic Net Model: ElasticNet() time: 0.01598834991455078 ---------------------------------- Model Name: SVR Regressor Model: SVR() time: 10.357428550720215 ---------------------------------- Model Name: Random Forest Regressor Model: RandomForestRegressor() time: 470.0197710990906 ----------------------------------
{'model_names': dict_keys(['Linear Regression', 'Ridge Regressor', 'Lasso Regressor', 'Elastic Net', 'SVR Regressor', 'Random Forest Regressor']),
'best_params': [{'fit_intercept': False, 'n_jobs': 2},
{'alpha': 0.25, 'solver': 'auto'},
{'alpha': 1.0, 'tol': 0.0001},
{'alpha': 0.25},
{'C': 100000, 'epsilon': 0.5, 'kernel': 'poly'},
{'criterion': 'squared_error',
'min_samples_leaf': 3,
'min_samples_split': 10,
'n_estimators': 50}],
'best_models': [LinearRegression(fit_intercept=False, n_jobs=2),
Ridge(alpha=0.25),
Lasso(),
ElasticNet(alpha=0.25),
SVR(C=100000, epsilon=0.5, kernel='poly'),
RandomForestRegressor(min_samples_leaf=3, min_samples_split=10, n_estimators=50)],
'best_mae': [9160832.829474986,
9202305.62213634,
9149756.873674182,
16093489.214604214,
13902783.439677324,
6755335.8286635885],
'best_mse': [308640837606585.2,
313231897628034.0,
307359853885892.06,
789616362996044.5,
916945486174921.6,
222688499449941.12],
'best_r2': [0.6698591868880606,
0.6649483128109922,
0.6712294041620035,
0.15537882106744094,
0.019179928577758742,
0.7617989801699765]}
For Best Model idx mse: 5 mae: 5 R2: 5
Below are the summaries for the 6 models used
| model_names | best_params | best_models | best_mae | best_mse | best_r2 | |
|---|---|---|---|---|---|---|
| 0 | Linear Regression | {'fit_intercept': False, 'n_jobs': 2} | LinearRegression(fit_intercept=False, n_jobs=2) | 9.160833e+06 | 3.086408e+14 | 0.669859 |
| 1 | Ridge Regressor | {'alpha': 0.25, 'solver': 'auto'} | Ridge(alpha=0.25) | 9.202306e+06 | 3.132319e+14 | 0.664948 |
| 2 | Lasso Regressor | {'alpha': 1.0, 'tol': 0.0001} | Lasso() | 9.149757e+06 | 3.073599e+14 | 0.671229 |
| 3 | Elastic Net | {'alpha': 0.25} | ElasticNet(alpha=0.25) | 1.609349e+07 | 7.896164e+14 | 0.155379 |
| 4 | SVR Regressor | {'C': 100000, 'epsilon': 0.5, 'kernel': 'poly'} | SVR(C=100000, epsilon=0.5, kernel='poly') | 1.390278e+07 | 9.169455e+14 | 0.019180 |
| 5 | Random Forest Regressor | {'criterion': 'squared_error', 'min_samples_le... | (DecisionTreeRegressor(max_features='auto', mi... | 6.755336e+06 | 2.226885e+14 | 0.761799 |
Mean squared error (MSE) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value SVR Regressor and Elastic net had the highest mse on test data
Mean absolute error (MAE) is a metric that is used to evaluate the performance of regression models. It’s defined as the average of the absolute difference between actual and predicted values. Elastic net and SVR Regressor had the highest mae on test data
Root Mean Square Error (RMSE) is a frequently used measure of the differences between values (sample and population values) predicted by a model or an estimator and the values actually observed. Random forest Regressor has the highest rms value
Model tuning is the systematic modification of model parameters to identify the most performant model. Based on the result of the best parameter, we tune the parameters as suggested.
{'criterion': 'squared_error', 'min_samples_leaf': 3, 'min_samples_split': 10, 'n_estimators': 50}
After Tuning the parameters as suggested. The result is seen below
0.6861450833517937